Drawing conclusions

Elizabeth King
Kevin Middleton

Next steps

  • Are these models “good”?
    • Posterior predictive checks and highest density intervals
  • Are parameter estimates credibly different from zero?
    • Regions of practical equivalence
  • Which model is “better”?
    • Model comparison

Making decisions with data

  • Bayesian inference doesn’t solve the “P-value” problem
  • Null hypothesis significance testing (NHST) can still be done
    • Whether it should or not

Convenience tools

  • rethinking and ulam() are great for learning
    • Explicitly ask for everything
    • Must set priors
  • We’ll start using the brms package
    • Automates some aspects of model fitting
    • Uses R model syntax
    • Converts factors
    • Sets default flat priors (so you should set your own)
    • Easier to work with posteriors (tidybayes, bayestestR)

Load the Snakes data

tibble [112 × 4] (S3: tbl_df/tbl/data.frame)
 $ ForagingMode: Factor w/ 2 levels "Active","Ambush": 2 1 2 2 1 2 2 2 2 1 ...
 $ SVL         : num [1:112] 83.7 48.7 174.2 44 87.3 ...
 $ RPM         : num [1:112] 0.301 0.33 0.075 0.33 0.0237 0.19 0.33 0.27 0.19 0.12 ...
 $ log10SVL    : num [1:112] 1.92 1.69 2.24 1.64 1.94 ...

Fit two models with brm()

  • Use lm() model syntax
  • Set prior for all b parameters: intercepts and slopes
    • Customize later
library(brms)
options(brms.backend = "cmdstanr")

fm_add <- brm(RPM ~ ForagingMode + log10SVL - 1, data = Snakes,
              prior = set_prior("normal(0, 0.5)", class = "b"), iter = 5e3)

fm_inter <- brm(RPM ~ ForagingMode * log10SVL, data = Snakes,
              prior = set_prior("normal(0, 0.5)", class = "b"), iter = 5e3)

Prior summary

prior_summary(fm_add)
                prior class               coef group resp dpar nlpar lb ub
       normal(0, 0.5)     b                                               
       normal(0, 0.5)     b ForagingModeActive                            
       normal(0, 0.5)     b ForagingModeAmbush                            
       normal(0, 0.5)     b           log10SVL                            
 student_t(3, 0, 2.5) sigma                                           0   
       source
         user
 (vectorized)
 (vectorized)
 (vectorized)
      default

Model Summaries: Additive model

summary(fm_add)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: RPM ~ ForagingMode + log10SVL - 1 
   Data: Snakes (Number of observations: 112) 
  Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
         total post-warmup draws = 10000

Population-Level Effects: 
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
ForagingModeActive     0.13      0.10    -0.07     0.32 1.00     1950     2475
ForagingModeAmbush     0.22      0.10     0.02     0.43 1.00     1971     2571
log10SVL               0.01      0.05    -0.10     0.12 1.00     1921     2542

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.12      0.01     0.10     0.14 1.00     3610     3682

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Model Summaries: Interaction model

summary(fm_inter)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: RPM ~ ForagingMode * log10SVL 
   Data: Snakes (Number of observations: 112) 
  Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
         total post-warmup draws = 10000

Population-Level Effects: 
                            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept                       0.32      0.14     0.05     0.59 1.00     3897
ForagingModeAmbush             -0.28      0.19    -0.64     0.09 1.00     3185
log10SVL                       -0.10      0.07    -0.25     0.05 1.00     3829
ForagingModeAmbush:log10SVL     0.20      0.10     0.00     0.40 1.00     3142
                            Tail_ESS
Intercept                       4981
ForagingModeAmbush              4325
log10SVL                        4912
ForagingModeAmbush:log10SVL     4162

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.12      0.01     0.10     0.13 1.00     6046     5426

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Plot posteriors

bayesplot has many functions for working with models.

library(bayesplot)
color_scheme_set(scheme = "red")

post <- as_draws_array(fm_add)

mcmc_combo(fm_add, regex_pars = "b",
           combo = c("dens_overlay", "rank_overlay"))

Plot posteriors

Posterior predictive checks

Plot the distribution of RPM and a sample of 100 draws from the posterior (\(y_{rep}\))

pp_check(fm_add, ndraws = 100)

Posterior predictive checks

Posterior predictive checks

Plot histograms of samples from the posteriors separately for ForagingMode with the mean of RPM for each.

pp_check(fm_add, type = "stat_grouped", group = "ForagingMode",
         stat = "mean",
         ndraws = 500)

Posterior predictive checks

Do foraging modes differ in mean RPM?

Three approaches

  1. Compare credible intervals of posteriors for Active and Ambush
  2. Compare the difference in the posteriors to 0
  3. Compare the difference in the posteriors to 0 \(\pm\) some small value

More context and practice in Unit 5

HPDI bands

Predict across a range of log10SVL for each level of ForagingMode.

pred_values <- crossing(
  log10SVL = seq(1.3, 2.5, length.out = 100),
  ForagingMode = levels(Snakes$ForagingMode)
)
pred_values
# A tibble: 200 × 2
   log10SVL ForagingMode
      <dbl> <chr>       
 1     1.3  Active      
 2     1.3  Ambush      
 3     1.31 Active      
 4     1.31 Ambush      
 5     1.32 Active      
 6     1.32 Ambush      
 7     1.34 Active      
 8     1.34 Ambush      
 9     1.35 Active      
10     1.35 Ambush      
# … with 190 more rows

Two kinds of posterior intervals

  1. HDIs for the parameter estimates
    • Credible ranges for expected values
    • posterior_epred()
  2. HDIs for new or observed values (“posterior predictive distribution”)
    • Include the uncertainty (\(\sigma\))
    • Wider than expected values intervals
    • posterior_predict()

Predicting with brms

p_pred <- posterior_epred(fm_add, newdata = pred_values)
str(p_pred)
 num [1:10000, 1:200] 0.161 0.152 0.17 0.158 0.163 ...
  • 10,000 draws for each of 200 combinations of log10SVL and ForagingMode.
  • Each column is 1 row of pred_values.

Computing credible intervals

  • Calculate median and 89% quantile interval (\(\sim\) HDI)
  • apply() quantile() across the columns (MARGIN = 2) of p_pred
pred_values <- pred_values |> 
  mutate(Q50 = apply(p_pred, MARGIN = 2, FUN = quantile, prob = 0.5),
         Q5.5 = apply(p_pred, MARGIN = 2, FUN = quantile, prob = 0.055),
         Q94.5 = apply(p_pred, MARGIN = 2, FUN = quantile, prob = 0.945))
pred_values

Computing credible intervals

# A tibble: 200 × 5
   log10SVL ForagingMode   Q50   Q5.5 Q94.5
      <dbl> <chr>        <dbl>  <dbl> <dbl>
 1     1.3  Active       0.137 0.0885 0.187
 2     1.3  Ambush       0.231 0.172  0.289
 3     1.31 Active       0.137 0.0894 0.186
 4     1.31 Ambush       0.231 0.173  0.288
 5     1.32 Active       0.137 0.0905 0.185
 6     1.32 Ambush       0.231 0.174  0.288
 7     1.34 Active       0.138 0.0915 0.185
 8     1.34 Ambush       0.231 0.175  0.287
 9     1.35 Active       0.138 0.0925 0.184
10     1.35 Ambush       0.231 0.176  0.286
# … with 190 more rows

Plotting credible intervals

ggplot() +
  geom_point(data = Snakes, aes(log10SVL, RPM, color = ForagingMode),
             size = 3) +
  geom_ribbon(data = pred_values,
              aes(x = log10SVL, ymin = Q5.5, ymax = Q94.5,
                  fill = ForagingMode), alpha = 0.25) +
  geom_line(data = pred_values,
            aes(x = log10SVL, y = Q50, color = ForagingMode)) +
  scale_color_manual(values = c("red", "blue")) +
  scale_fill_manual(values = c("red", "blue"))

Plotting credible intervals

Compare HDIs

post <- as_draws_df(post) |> as_tibble()

library(bayestestR)

hdi(post$b_ForagingModeActive, ci = 0.89)
89% HDI: [-0.03, 0.28]
hdi(post$b_ForagingModeAmbush, ci = 0.89)
89% HDI: [0.06, 0.39]

Compare credible intervals

  • Highest density vs. equal tails
ci(post[, 1:4], method = "HDI", ci = 0.89)
Highest Density Interval

Parameter            |       89% HDI
------------------------------------
b_ForagingModeActive | [-0.03, 0.28]
b_ForagingModeAmbush | [ 0.06, 0.39]
b_log10SVL           | [-0.08, 0.09]
sigma                | [ 0.11, 0.13]
ci(post[, 1:4], method = "ETI", ci = 0.89)
Equal-Tailed Interval

Parameter            |       89% ETI
------------------------------------
b_ForagingModeActive | [-0.03, 0.28]
b_ForagingModeAmbush | [ 0.06, 0.39]
b_log10SVL           | [-0.08, 0.09]
sigma                | [ 0.11, 0.13]

HDI of the difference between Ambush and Active

Use mutate() to calculate the difference for each sample

post <- post |> 
  mutate(d = b_ForagingModeAmbush - b_ForagingModeActive)
post |> select(1:3, 10)
# A tibble: 10,000 × 4
   b_ForagingModeActive b_ForagingModeAmbush b_log10SVL      d
                  <dbl>                <dbl>      <dbl>  <dbl>
 1              0.166                 0.235    -0.00413 0.0687
 2              0.197                 0.309    -0.0344  0.112 
 3              0.223                 0.301    -0.0403  0.0785
 4              0.209                 0.320    -0.0388  0.111 
 5              0.215                 0.310    -0.0399  0.0944
 6              0.226                 0.339    -0.0512  0.113 
 7              0.150                 0.260    -0.0109  0.110 
 8              0.149                 0.272    -0.00596 0.123 
 9              0.155                 0.240    -0.0112  0.0850
10              0.00531               0.0814    0.0719  0.0761
# … with 9,990 more rows

HDI of Ambush - Active

ggplot(post, aes(d)) +
  geom_density(linewidth = 2) +
  labs(x = "Ambush - Active", y = "Density")

HDI of Ambush - Active

hdi(post$d, method = "HDI", ci = 0.89)
89% HDI: [0.06, 0.13]
hdi(post$d, method = "ETI", ci = 0.89)
89% HDI: [0.06, 0.13]

Region of practical equivalence

“A difference of zero plus or minus some small amount is for practical purposes ‘not different’ from zero.”

Easystats has a good introduction. Additional reading:

ROPE

  • Uses rope_range() to automatically find the most suitable range for comparison.
    • Think about what interval you consider practically equivalent
    • Kruschke also recommends \(\pm 0.1 \cdot sd_y\)
  • Recommended to use the entire posterior to determine the percentage in the ROPE

ROPE

rr <- c(-0.1 * sd(Snakes$RPM), 0.1 * sd(Snakes$RPM))
rr
[1] -0.01257294  0.01257294
rope(post$d, ci = 1, range = rr)
# Proportion of samples inside the ROPE [-0.01, 0.01]:

inside ROPE
-----------
0.10 %     

ROPE vs. HDI

ggplot(post, aes(d)) +
  geom_density(linewidth = 1.5) +
  geom_vline(xintercept = rr,
             color = "red", linewidth = 1.5) +
  geom_vline(xintercept = as.numeric(hdi(post$d))[2:3],
             color = "blue", linewidth = 1.5) +
  labs(x = "Ambush - Active", y = "Density")

Leave-one-out Cross Validation (LOO-CV)

Stan uses an approximation of the LOO-CV (Gelman et al. 2014; Vehtari et al. 2017; Bürkner et al. 2021)

fm_add <- add_criterion(fm_add, "loo")
fm_inter <- add_criterion(fm_inter, "loo")

loo_compare(fm_add, fm_inter)
         elpd_diff se_diff
fm_inter  0.0       0.0   
fm_add   -1.4       2.0   

Model weights

Distribute the expected predictive ability across 100%.

model_weights(fm_add, fm_inter, weights = "loo")
   fm_add  fm_inter 
0.1988786 0.8011214 
  • Interaction model has most of the model weight.
  • Not 100% though (model averaging in Unit 5)

No decision is completely objective

  • P-values, confidence intervals
  • AIC, AICc
  • Credible intervals, HDIs
  • ROPEs
  • LOO-CV values

More discussion, detail, examples in Unit 5.

References

Bürkner, P.-C., J. Gabry, and A. Vehtari. 2021. Efficient Leave-One-Out Cross-Validation for Bayesian Non-Factorized Normal and Student-t Models. Comput. Stat. 36:1243–1261.
Gelman, A., J. Hwang, and A. Vehtari. 2014. Understanding Predictive Information Criteria for Bayesian Models. Stat. Comput. 24:997–1016. Springer US.
Kruschke, J. K. 2015. Doing Bayesian Data Analysis: a Tutorial with R, JAGS, and Stan. 2nd ed. Academic Press, Boston, MA.
Kruschke, J. K. 2018. Rejecting or Accepting Parameter Values in Bayesian Estimation. Advances in Methods and Practices in Psychological Science 1:270–280. SAGE Publications Inc.
Kruschke, J. K. 2010. What to Believe: Bayesian Methods for Data Analysis. Trends Cogn. Sci. 14:293–300.
Kruschke, J. K., H. Aguinis, and H. Joo. 2012. The Time Has Come: Bayesian Methods for Data Analysis in the Organizational Sciences. Organizational Research Methods 15:722–752. SAGE Publications Inc.
Vehtari, A., A. Gelman, and J. Gabry. 2017. Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and WAIC. Stat. Comput. 27:1413–1432.